Let’s install needed packages.

The first detailed example will be explained for European countries.

Similar analyses for European regions, US states and world countries will follow.

1 Step 1. importing raster data and map into R

1.1 Raster data

In this tutorial I used DMSP-OLS data for 2013 zipped in this archive – a specific file is called F182013.v4c_web.stable_lights.avg_vis.tif (it’s size - 692 MB). Due to GitHub file size limitations you have to download this file manually and store in the local files folder.

Raster data is commonly used to represent spatially continuous phenomena. A raster divides the world into a grid of equally sized rectangles (referred to as cells or, in the context of satellite remote sensing, pixels) that all have one or more values (or missing values) for the variables of interest.

The raster() function is used to import data in the raster package, if the file has only one layer (i.e. for each cell / pixel there is only one value stored in it) or the brick() function, if there are more layers.

Nighttime light data contains only one layer

1.2 Shapefile for European countries

I will limit the raster data only to the EU area. To do this, let’s also load shapefile file with a map of EU countries (obtained from Eurostat - a particular file used is UTS_RG_10M_2016_4326_LEVL_0.shp.zip, NUTS0 is the national level).

## Reading layer `NUTS_RG_10M_2013_4326_LEVL_0' from data source `C:\Users\Piotr\Dropbox\__biezace\2019-07-09_UseR_Toulouse\_prez\files\NUTS_RG_10M_2013_4326_LEVL_0.shp' using driver `ESRI Shapefile'
## Simple feature collection with 35 features and 5 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -63.08826 ymin: -21.38731 xmax: 55.83663 ymax: 71.17354
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs

Let’s limit the map to the mainland (we cut all the islands). For spatial limitation of sf objects one can use a function st_crop(), in which the second argument is the object defining the bounding box.

##      xmin      ymin      xmax      ymax 
## -63.08826 -21.38731  55.83663  71.17354
## although coordinates are longitude/latitude, st_intersection assumes that they are planar
## Warning: attribute variables are assumed to be spatially constant
## throughout all geometries

Lets see the map again.

One can also use ggplot2 for mapping sf objects.

Presenting the values of the selected variable requires mapping its name on the fill aestetics.

2 Step 2. checking and adjusting

To be able to impose a map on raster data, one has to make sure that THE SAME geographical coordinate projection is used in BOTH datasets.

Let’s check what are the current values of these parameters.

## CRS arguments:
##  +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
## [1] "+proj=longlat +datum=WGS84 +no_defs"

Seems identical (WGS84).

If there is a need to unify the projections, one can simply copy the one used in the raster file into the map. The st_transform() function can be used for this:

3 Step 3. limitation of the extent of raster data

To apply spatial limitation of the raster data set one has to use a function crop(), in which the second argument is the extent of the area. It must be an Extent class object, resulting from the use of the extent() function.

Lets see the extent of our raster data and for the map:

## class      : Extent 
## xmin       : -180.0042 
## xmax       : 180.0042 
## ymin       : -65.00417 
## ymax       : 75.00417
## class      : Extent 
## xmin       : -10.47241 
## xmax       : 44.82055 
## ymin       : 34.5634 
## ymax       : 71.15304

We crop the raster data to the rectangle, where the European Union is located.

Let’s see the result.

## png 
##   2

One can clearly see European continent and major cities. Lets super-impose the map of EU regions.

## png 
##   2

This data still contains areas outside the EU. Raster data for pixels outside the polygons can be reemoved (set to missing) with the mask() function.

The more polygons (regions), the more time it takes (here ca. 40 seconds).

Let’s display data truncated to the EU area imposing once again the grid of countries borders.

## png 
##   2

Data have been correctly restricted to the European Union area.

4 Step 4. aggregation to spatial units

Very efficient identification and aggregation of data for selected areas can be done by using the cellnumbers() function from the tabularaster package by Michael D. Sumner. It allows to identify numbers (indexes) of cells located in every region (polygon in the map object). And aggregation based on indexed data is ca. 500 times faster than with using purely the extract() function.

The only problem here is that some of the countries have several islands (the geometry of type MULTIPOLYGON) which is not YET SUPPORTED by cellnumbers().

So for this simple example lets convert all geometries to just POLYGONs (inc case of a MULTIPOLYGON only the first will be used).

And finally the aggregation.

Lets add a new column to the map data frame.

and plot it on a map

5 Correlations with statistical data

Lets in addtion import data on GDP and population from Eurostat.

## Parsed with column specification:
## cols(
##   NUTS_ID = col_character(),
##   population2013 = col_double(),
##   gdpcap2013 = col_double(),
##   gdp2013 = col_double()
## )

To merge them correctly I convert all values to character.

Lets check correlations between lights, log(lights) and GDP, GDP per capita and population.

##                population2013    gdp2013   gdpcap2013   lights2013
## population2013      1.0000000 0.88636395 -0.114257199  0.877584427
## gdp2013             0.8863639 1.00000000  0.068928187  0.876565845
## gdpcap2013         -0.1142572 0.06892819  1.000000000 -0.002230403
## lights2013          0.8775844 0.87656584 -0.002230403  1.000000000
## l_lights2013        0.6720392 0.62781102 -0.238638955  0.775179193
##                l_lights2013
## population2013    0.6720392
## gdp2013           0.6278110
## gdpcap2013       -0.2386390
## lights2013        0.7751792
## l_lights2013      1.0000000

Scatterplot between GDP and lights.

Scatterplot between population and lights.

6 Analysis for European NUTS 2 regions

6.1 Importing shapefile

Import shapefile for European NUTS2 regions and limit its area to the mainland of Europe (cut the islands as before).

## Reading layer `NUTS_RG_10M_2013_4326_LEVL_2' from data source `C:\Users\Piotr\Dropbox\__biezace\2019-07-09_UseR_Toulouse\_prez\files\NUTS_RG_10M_2013_4326_LEVL_2.shp' using driver `ESRI Shapefile'
## Simple feature collection with 320 features and 5 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -63.08826 ymin: -21.38731 xmax: 55.83663 ymax: 71.17354
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## although coordinates are longitude/latitude, st_intersection assumes that they are planar
## Error in CPL_geos_op2(op, st_geometry(x), st_geometry(y)): Evaluation error: TopologyException: Input geom 0 is invalid: Self-intersection at or near point 13.167791844340858 66.658418257526989 at 13.167791844340858 66.658418257526989.

6.2 Checking validity

To be able to limit the raster data to the desired area one has to make sure that should check if all geometries in the sf object are valid. Intuitively the two-dimensional geometry (POLYGON) is valid, if its boundaries are created from subsequent connecting, but non-overlapping and non-intersecting sections. MULTIPOLYGON geometry is valid if it consists of valid POLYGONs.

Checking the validity of the geometries of the analyzed objects can be time-consuming (especially for very complex geometries), but it saves problems in their subsequent visualization or analysis.

## 
## FALSE  TRUE 
##     9   311

Nine regions have incorrect geometries. They might me automatically corrected with st_make_valid() function from lwgeom package, which is the extension of sf. Its description can be found here: github.com/r-spatial/lwgeom.

And apply the crop() function once again.

## although coordinates are longitude/latitude, st_intersection assumes that they are planar
## Warning: attribute variables are assumed to be spatially constant
## throughout all geometries

Now it works fine. Next we adjust the coordinates.

And plot the map of EU NUTS2 regions superimposed on night-time lights distribution.

## png 
##   2

6.3 Aggregation

For the use of cellnumbers() we need to simplify complex geometries to just POLYGONs.

And apply aggregation.

Lets add a new column to the map data frame.

## Error in `[[<-.data.frame`(`*tmp*`, i, value = c(72969, 361161, 207187, : replacement has 310 rows, data has 311

Lights data is missing for one region - which?

## [1] 294
## Simple feature collection with 1 feature and 5 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 11.92691 ymin: 59.79043 xmax: 11.92696 ymax: 59.79046
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
##     NUTS_ID LEVL_CODE CNTR_CODE        NUTS_NAME  FID
## 303    NO01         2        NO Oslo og Akershus NO01
##                           geometry
## 303 POLYGON ((11.92691 59.79046...

This region has a very strange shape – for simplicity, let’s assume it has the intensity of night lights equal to 0.

Add a new column to the map data frame.

And finally plot it on a map

## Saving 10 x 5 in image

6.4 Importing data from Eurostat

Lets in addtion import data on GDP and population for regions directly from Eurostat - with the use of eurostat package – see package tutorial.

To download data, you need to know their codes used by Eurostat.

This can be checked on the Eurostat website or in particular reports, for example GDP at regional level, Population statistics at regional level.

You can also check them by searching the Eurostat database by keywords:

## Registered S3 method overwritten by 'cli':
##   method     from    
##   print.boxx spatstat
## # A tibble: 6 x 3
##   title                                                 code         type  
##   <chr>                                                 <chr>        <chr> 
## 1 GDP and main components (output, expenditure and inc~ nama_10_gdp  datas~
## 2 GDP and main components  (output, expenditure and in~ namq_10_gdp  datas~
## 3 Gross domestic product (GDP) at current market price~ nama_10r_2g~ datas~
## 4 Average annual population to calculate regional GDP ~ nama_10r_3p~ datas~
## 5 Gross domestic product (GDP) at current market price~ nama_10r_3g~ datas~
## 6 European Union trade mark (EUTM) applications per bi~ ipr_ta_gdpr  datas~

Let’s download the GDP figures in the NUTS2 regions (nama_10r_2gdp).

## Table nama_10r_2gdp cached at C:\Users\Piotr\AppData\Local\Temp\Rtmp61gJRc/eurostat/nama_10r_2gdp_num_code_TF.rds

Similarly for population figures.

## Table nama_10r_3popgdp cached at C:\Users\Piotr\AppData\Local\Temp\Rtmp61gJRc/eurostat/nama_10r_3popgdp_num_code_TF.rds

We need to add this data to the sf object with geometries - all datasets use common Eurostat codes for regions, but columns have different names.

## Warning: Column `NUTS_ID`/`geo` joining factors with different levels,
## coercing to character vector
## Warning: Column `NUTS_ID`/`geo` joining character vector and factor,
## coercing into character vector

6.5 Correlations

Finally we check for correlations.

##              lights2013   gdp2013   pop2013 l_lights2013
## lights2013    1.0000000 0.3516237 0.4219286    0.6588570
## gdp2013       0.3516237 1.0000000 0.8744452    0.2271030
## pop2013       0.4219286 0.8744452 1.0000000    0.3542235
## l_lights2013  0.6588570 0.2271030 0.3542235    1.0000000

And make a scatterplot between GDP and lights

## Warning: Removed 64 rows containing non-finite values (stat_smooth).
## Warning: Removed 64 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_label_repel).

Scatterplot between ligts and population in NUTS2 regions.

## Warning: Removed 54 rows containing non-finite values (stat_smooth).
## Warning: Removed 54 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_label_repel).

7 Analysis for NUTS 3 regions

7.1 Importing shapefile

## Reading layer `NUTS_RG_10M_2013_4326_LEVL_3' from data source `C:\Users\Piotr\Dropbox\__biezace\2019-07-09_UseR_Toulouse\_prez\files\NUTS_RG_10M_2013_4326_LEVL_3.shp' using driver `ESRI Shapefile'
## Simple feature collection with 1480 features and 5 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -63.08826 ymin: -21.38731 xmax: 55.83663 ymax: 71.17354
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs

7.2 Validation and adjustment

## although coordinates are longitude/latitude, st_intersection assumes that they are planar
## Warning: attribute variables are assumed to be spatially constant
## throughout all geometries

Map of EU NUTS3 regions imposed on night-time lights intensity.

## png 
##   2

7.4 Data from Eurostat for NUTS3 regions

## Table nama_10r_3gdp cached at C:\Users\Piotr\AppData\Local\Temp\Rtmp61gJRc/eurostat/nama_10r_3gdp_num_code_TF.rds
## Reading cache file C:\Users\Piotr\AppData\Local\Temp\Rtmp61gJRc/eurostat/nama_10r_3popgdp_num_code_TF.rds
## Table  nama_10r_3popgdp  read from cache file:  C:\Users\Piotr\AppData\Local\Temp\Rtmp61gJRc/eurostat/nama_10r_3popgdp_num_code_TF.rds

Merging the dada with the sf object.

## Warning: Column `NUTS_ID`/`geo` joining factors with different levels,
## coercing to character vector
## Warning: Column `NUTS_ID`/`geo` joining character vector and factor,
## coercing into character vector

And checking correlations.

##              lights2013   gdp2013   pop2013 l_lights2013
## lights2013    1.0000000 0.4205909 0.5163390    0.8188150
## gdp2013       0.4205909 1.0000000 0.9006870    0.3250995
## pop2013       0.5163390 0.9006870 1.0000000    0.4703375
## l_lights2013  0.8188150 0.3250995 0.4703375    1.0000000

Scatterplot between lights and GDP

## Warning: Removed 363 rows containing non-finite values (stat_smooth).
## Warning: Removed 363 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_label_repel).

Scatterplot between population and nighttime lights intensity.

8 Analysis for US states

8.1 Import shapefile

## Reading layer `tl_2018_us_state' from data source `C:\Users\Piotr\Dropbox\__biezace\2019-07-09_UseR_Toulouse\_prez\files\tl_2018_us_state.shp' using driver `ESRI Shapefile'
## Simple feature collection with 56 features and 14 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -179.2311 ymin: -14.60181 xmax: 179.8597 ymax: 71.43979
## epsg (SRID):    4269
## proj4string:    +proj=longlat +datum=NAD83 +no_defs

8.2 Adjusting

Adjusting coordinates projections between shapefile and raster data.

Checking validity.

## 
## TRUE 
##   56

Limiting the bounding box.

## although coordinates are longitude/latitude, st_intersection assumes that they are planar
## Warning: attribute variables are assumed to be spatially constant
## throughout all geometries

8.3 Limiting raster data to the area of US

Then raster image is cropped to the are of US.

## png 
##   2

One can clearly identify major US cities.

Masking the data for areas outside USA.

And display data truncated to the US area imposing once again the grid of states borders.

## png 
##   2

8.5 Correlation with census data

Importing data about state GDP and population.

## Parsed with column specification:
## cols(
##   GeoFips = col_character(),
##   code = col_character(),
##   GeoName = col_character(),
##   gdp2013 = col_double(),
##   gdp2016 = col_double(),
##   pop2013 = col_double(),
##   pop2016 = col_double()
## )

To merge the data correctly let’s convert all state codes in a map file to character values.

Correlation matrix

##              lights2013 l_lights2013   gdp2013   pop2013
## lights2013    1.0000000    0.8163420 0.7847893 0.8331715
## l_lights2013  0.8163420    1.0000000 0.5669594 0.6269202
## gdp2013       0.7847893    0.5669594 1.0000000 0.9845955
## pop2013       0.8331715    0.6269202 0.9845955 1.0000000

Scatterplot between GDP and lights in US states.

## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing missing values (geom_point).

Scatterplot between population and lights in US states.

## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing missing values (geom_point).

9 Analysis for world countries

9.1 Importing map data

In case of world counties lets use wrld_simpl object from the maptools package converted to sf object.

9.2 Checking and adjusting

Lets adjust the coordinates projections

## [1] "+proj=longlat +datum=WGS84 +no_defs"
## CRS arguments:
##  +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0

and check validity of geometries.

## 
## FALSE  TRUE 
##     4   242

Let’s also convert the type of geometries to POLYGONs.

## Warning in st_cast.sf(map_world, "POLYGON"): repeating attributes for all
## sub-geometries for which they may not be constant

It appears that the extent of the map is larger than the extent of the raster data.

## class      : Extent 
## xmin       : -180 
## xmax       : 180 
## ymin       : -90 
## ymax       : 83.57027
## class      : Extent 
## xmin       : -180.0042 
## xmax       : 180.0042 
## ymin       : -65.00417 
## ymax       : 75.00417

Let’s limit the map to the extent of raster object.

## although coordinates are longitude/latitude, st_intersection assumes that they are planar
## Warning: attribute variables are assumed to be spatially constant
## throughout all geometries

9.3 Data aggregation

As operating on the whole raster data may be too demanding (in this case R will try to read all data into memory), we will make the aggregation using a loop and sum night-time lights intensity for every country separately (in fact every POLYGON as we have just converted original data into POLYGONs).

CAUTION! This loop run for about 45 minutes on my laptop. Let me know if you find a more (time) efficient way :)

Let’s collect the calculated values in a numeric vector.

Let’s add the values to the map data and finally aggregate over POLYGONs within countries.

Finally NTLI data by countries are added to the original map data.

## Joining, by = "ISO2"

9.4 Importing data from WDI

Data about night-time lights intensity will be correlated with GDP and population, similarly as in previous examples. For countries of the world such data can be imported in R directrly from the World Development Indicators database with the WDI package.

To request the indicator GDP (Current US$) we need to use its code NY.GDP.MKTP.CD.

One can also search for the code with the WDIsearch() function.

##       indicator             
##  [1,] "BX.TRF.PWKR.GD.ZS"   
##  [2,] "BX.TRF.PWKR.DT.GD.ZS"
##  [3,] "BX.TRF.MGR.DT.GD.ZS" 
##  [4,] "BX.KLT.DINV.WD.GD.ZS"
##  [5,] "BX.KLT.DINV.DT.GD.ZS"
##  [6,] "BX.GSR.MRCH.ZS"      
##  [7,] "6.0.GDPpc_constant"  
##  [8,] "6.0.GDP_usd"         
##  [9,] "6.0.GDP_growth"      
## [10,] "6.0.GDP_current"     
##       name                                                  
##  [1,] "Workers' remittances, receipts (% of GDP)"           
##  [2,] "Personal remittances, received (% of GDP)"           
##  [3,] "Migrant remittance inflows (% of GDP)"               
##  [4,] "Foreign direct investment, net inflows (% of GDP)"   
##  [5,] "Foreign direct investment, net inflows (% of GDP)"   
##  [6,] "Merchandise exports (BOP): percentage of GDP (%)"    
##  [7,] "GDP per capita, PPP (constant 2011 international $) "
##  [8,] "GDP (constant 2005 $)"                               
##  [9,] "GDP growth (annual %)"                               
## [10,] "GDP (current $)"

We also need data on population:

Let’s put both data together.

##   iso2c                                       country year NY.GDP.MKTP.CD
## 1    1A                                    Arab World 2013   2.867265e+12
## 2    1W                                         World 2013   7.718995e+13
## 3    4E   East Asia & Pacific (excluding high income) 2013   1.182778e+13
## 4    7E Europe & Central Asia (excluding high income) 2013   4.314982e+12
## 5    8S                                    South Asia 2013   2.357175e+12
## 6    AD                                       Andorra 2013   3.281585e+09
##   SP.POP.TOTL
## 1   379705719
## 2  7170961674
## 3  2008932012
## 4   405778196
## 5  1705772050
## 6       80774

There is a column called iso2c which includes the same country codes as ISO2 in the map file, so data can be easily put together with the map object. Lets give informative names to the last two columns.

## Warning: Column `ISO2`/`iso2c` joining factor and character vector,
## coercing into character vector

9.5 Correlations

In the last step we will calculate the correlation matrix.

##              lights2013 l_lights2013   gdp2013   pop2013
## lights2013    1.0000000    0.3891884 0.9150627 0.5713779
## l_lights2013  0.3891884    1.0000000 0.3923265 0.3312246
## gdp2013       0.9150627    0.3923265 1.0000000 0.5338668
## pop2013       0.5713779    0.3312246 0.5338668 1.0000000

and present relationships graphically.

## Warning: Removed 43 rows containing non-finite values (stat_smooth).
## Warning: Removed 43 rows containing missing values (geom_point).

And the same for population vs night-time lights.

## Warning: Removed 35 rows containing non-finite values (stat_smooth).
## Warning: Removed 35 rows containing missing values (geom_point).